🧠 CUDA Memory Management - miterion · Scour

AI in Multiple GPUs: Understanding the Host and Device Paradigm

towardsdatascience.com·1h

⏱️CUDA Events

Allocators from C to Zig

antonz.org·2h

📊Profiling Tools

Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues

semiengineering.com·6h

📈Occupancy Optimization

Two Ways to Move Tensors Without Stopping: Inside vLLM's Async GPU Transfer Patterns

dev.to·17h·

Discuss: DEV

🌊CUDA Streams

From Buffers to Registers: Unlocking Fine-Grained FlashAttention with Hybrid-Bonded 3D NPU Co-Design

arxiv.org·9h

⚡Flash Attention

Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching

arxiv.org·9h

⚡Flash Attention

anulum/sc-neurocore: Verified Rust-based Neuromorphic Compiler. 512x Real-Time Speed. Bit-True FPGA Equivalence. (AGPLv3 / Commercial)

github.com·2h·

Discuss: Hacker News

🎯Tensor Cores

Show HN: Solving Sudoku reasoning via Energy Geometric models

davisgeometric.com·4h·

Discuss: Hacker News

The system statistics collection daemon

collectd.org·18m

📊Profiling Tools

Live Update Orchestrator — The Linux Kernel documentation

docs.kernel.org·4h

📊Profiling Tools

How Programmers Spend Their Time

probablydance.com·1d·

Discuss: Hacker News

⚡Flash Attention

Benchmarking Malloc with Doom 3

forrestthewoods.com·4d

📊Profiling Tools

How Memory Technology Is Powering the Next Era of Compute

semiwiki.com·20h

⚙️Systems Programming

Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

chipublib.idm.oclc.org·22h

📊Gradient Accumulation

Zvec: SQLite-like simplicity in an embedded vector database (By Alibaba)

zvec.org·1h·

Discuss: Hacker News

The Efficiency Wall: Why the Next 1,000x Leap Isn’t More GPUs

pub.towardsai.net

·10h

🌊CUDA Streams

AI Inference Needs A Mix-And-Match Memory Strategy

semiengineering.com·6h

🎯Tensor Cores

Creeping memory allocation

community.folivora.ai·4d

📈Occupancy Optimization

Avoiding UB but "safe" data race in a lock-free slab allocator - help - The Rust Programming Language Forum

users.rust-lang.org·19h

⚡CUDA Programming Patterns

Agent Memory Storage: A Practical Guide

dev.to·15h·

Discuss: DEV

⚡ONNX Runtime

Loading more...